99 research outputs found

    Instance-Based Matching of Large Life Science Ontologies

    Get PDF
    Ontologies are heavily used in life sciences so that there is increasing value to match different ontologies in order to determine related conceptual categories. We propose a simple yet powerful methodology for instance-based ontology matching which utilizes the associations between molecular-biological objects and ontologies. The approach can build on many existing ontology associations for instance objects like sequences and proteins and thus makes heavy use of available domain knowledge. Furthermore, the approach is flexible and extensible since each instance source with associations to the ontologies of interest can contribute to the ontology mapping. We study several approaches to determine the instance-based similarity of ontology categories. We perform an extensive experimental evaluation to use protein associations for different species to match between subontologies of the Gene Ontology and OMIM. We also provide a comparison with metadata-based ontology matching

    Base pair interactions and hybridization isotherms of matched and mismatched oligonucleotide probes on microarrays

    Full text link
    The lack of specificity in microarray experiments due to non-specific hybridization raises a serious problem for the analysis of microarray data because the residual chemical background intensity is not related to the expression degree of the gene of interest. We analyzed the concentration dependence of the signal intensity of perfect match (PM) and mismatch (MM) probes in terms using a microscopic binding model using a combination of mean hybridization isotherms and single base related affinity terms. The signal intensities of the PM and MM probes and their difference are assessed with regard to their sensitivity, specificity and resolution for gene expression measures. The presented theory implies the refinement of existing algorithms of probe level analysis to correct microarray data for non-specific background intensities.Comment: 32 pages, 12 figures, 3 table

    Evolution of Spliceosomal snRNA Genes in Metazoan Animals

    Get PDF
    While studies of the evolutionary histories of protein families are commonplace, little is known on noncoding RNAs beyond microRNAs and some snoRNAs. Here we investigate in detail the evolutionary history of the nine spliceosomal snRNA families (U1, U2, U4, U5, U6, U11, U12, U4atac, and U6atac) across the completely or partially sequenced genomes of metazoan animals. Representatives of the five major spliceosomal snRNAs were found in all genomes. None of the minor splicesomal snRNAs were detected in nematodes or in the shotgun traces of Oikopleura dioica, while in all other animal genomes at most one of them is missing. Although snRNAs are present in multiple copies in most genomes, distinguishable paralogue groups are not stable over long evolutionary times, although they appear independently in several clades. In general, animal snRNA secondary structures are highly conserved, albeit, in particular, U11 and U12 in insects exhibit dramatic variations. An analysis of genomic context of snRNAs reveals that they behave like mobile elements, exhibiting very little syntenic conservation

    An Evolution-based Approach for Assessing Ontology Mappings - A Case Study in the Life Sciences

    Get PDF
    Ontology matching has been widely studied. However, the resulting on-tology mappings can be rather unstable when the participating ontologies or util-ized secondary sources (e.g., instance sources, thesauri) evolve. We propose an evolution-based approach for assessing ontology mappings by annotating their cor-respondences by information about similarity values for past ontology versions. These annotations allow us to assess the stability of correspondences over time and they can thus be used to determine better and more robust ontology mappings. The approach is generic in that it can be applied independently from the utilized match technique. We define different stability measures and show results of a first evaluation for the life science domain

    GAN-Based Approaches for Generating Structured Data in the Medical Domain

    Get PDF
    Modern machine and deep learning methods require large datasets to achieve reliable and robust results. This requirement is often difficult to meet in the medical field, due to data sharing limitations imposed by privacy regulations or the presence of a small number of patients (e.g., rare diseases). To address this data scarcity and to improve the situation, novel generative models such as Generative Adversarial Networks (GANs) have been widely used to generate synthetic data that mimic real data by representing features that reflect health-related information without reference to real patients. In this paper, we consider several GAN models to generate synthetic data used for training binary (malignant/benign) classifiers, and compare their performances in terms of classification accuracy with cases where only real data are considered. We aim to investigate how synthetic data can improve classification accuracy, especially when a small amount of data is available. To this end, we have developed and implemented an evaluation framework where binary classifiers are trained on extended datasets containing both real and synthetic data. The results show improved accuracy for classifiers trained with generated data from more advanced GAN models, even when limited amounts of original data are available

    GOMMA: a component-based infrastructure for managing and analyzing life science ontologies and their evolution

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Ontologies are increasingly used to structure and semantically describe entities of domains, such as genes and proteins in life sciences. Their increasing size and the high frequency of updates resulting in a large set of ontology versions necessitates efficient management and analysis of this data.</p> <p>Results</p> <p>We present GOMMA, a generic infrastructure for managing and analyzing life science ontologies and their evolution. GOMMA utilizes a generic repository to uniformly and efficiently manage ontology versions and different kinds of mappings. Furthermore, it provides components for ontology matching, and determining evolutionary ontology changes. These components are used by analysis tools, such as the Ontology Evolution Explorer (OnEX) and the detection of unstable ontology regions. We introduce the component-based infrastructure and show analysis results for selected components and life science applications. GOMMA is available at <url>http://dbs.uni-leipzig.de/GOMMA</url>.</p> <p>Conclusions</p> <p>GOMMA provides a comprehensive and scalable infrastructure to manage large life science ontologies and analyze their evolution. Key functions include a generic storage of ontology versions and mappings, support for ontology matching and determining ontology changes. The supported features for analyzing ontology changes are helpful to assess their impact on ontology-dependent applications such as for term enrichment. GOMMA complements OnEX by providing functionalities to manage various versions of mappings between two ontologies and allows combining different match approaches.</p

    Interview-based assessment of avoidant/restrictive food intake disorder (ARFID): A pilot study evaluating an ARFID module for the Eating Disorder Examination

    Get PDF
    Objective Although avoidant/restrictive food intake disorder (ARFID) has been included as a new diagnostic entity of childhood feeding and eating disorders, there is a lack of measures to reliably and validly assess ARFID. In addition, virtually nothing is known about clinical characteristics of ARFID in nonclinical samples. Method The present study presents the development and validation of an ARFID module for the child and parent version of the Eating Disorder Examination (EDE) in a nonclinical sample of N = 39 children between 8 and 13 years with underweight and/or restrictive eating behaviors. For evaluating the ARFID module's reliability, the convergence of diagnoses between two independent raters and between the child and parent module was determined. The module's validity was evaluated based on the full-length child version of the EDE, a 24 h food record, parent-reported psychosocial functioning and self-reported quality of life, and objective anthropometric measures. Results In total, n = 7 children received an ARFID diagnosis. The ARFID module showed high interrater reliability, especially for the parent version, and high convergence between child and parent report. Evidence for the module's convergent, divergent, and discriminant validity was provided. Specifically, children with versus without ARFID reported significantly less macro- and micronutrient intake and were more likely to be underweight. Discussion This pilot study indicates the child and parent version of the EDE ARFID module to be promising for diagnosing ARFID in a structured way but still necessitates a validation in a larger clinical and community-based sample

    Towards an Ontology-Based Phenotypic Query Model

    Get PDF
    Clinical research based on data from patient or study data management systems plays an important role in transferring basic findings into the daily practices of physicians. To support study recruitment, diagnostic processes, and risk factor evaluation, search queries for such management systems can be used. Typically, the query syntax as well as the underlying data structure vary greatly between different data management systems. This makes it difficult for domain experts (e.g., clinicians) to build and execute search queries. In this work, the Core Ontology of Phenotypes is used as a general model for phenotypic knowledge. This knowledge is required to create search queries that determine and classify individuals (e.g., patients or study participants) whose morphology, function, behaviour, or biochemical and physiological properties meet specific phenotype classes. A specific model describing a set of particular phenotype classes is called a Phenotype Specification Ontology. Such an ontology can be automatically converted to search queries on data management systems. The methods described have already been used successfully in several projects. Using ontologies to model phenotypic knowledge on patient or study data management systems is a viable approach. It allows clinicians to model from a domain perspective without knowing the actual data structure or query language

    Comparative evaluation of microarray-based gene expression databases

    Get PDF
    Microarrays make it possible to monitor the expression of thousands of genes in parallel thus generating huge amounts of data. So far, several databases have been developed for managing and analyzing this kind of data but the current state of the art in this field is still early stage. In this paper, we comprehensively analyze the requirements for microarray data management. We consider the various kinds of data involved as well as data preparation, integration and analysis needs. The identified requirements are then used to comparatively evaluate eight existing microarray databases described in the literature. In addition to providing an overview of the current state of the art we identify problems that should be addressed in the future to obtain better solutions for managing and analyzing microarray data
    • …
    corecore